Skip to content

Add SparseHist wrapper for large multi-systematic histograms#25

Open
bendavid wants to merge 3 commits intoWMass:mainfrom
bendavid:sparsehists
Open

Add SparseHist wrapper for large multi-systematic histograms#25
bendavid wants to merge 3 commits intoWMass:mainfrom
bendavid:sparsehists

Conversation

@bendavid
Copy link
Copy Markdown
Contributor

@bendavid bendavid commented Apr 9, 2026

Three commits adding a SparseHist wrapper class around scipy
sparse arrays carrying hist axes metadata, plus supporting fixes.

This provides a minimal python representation for sparse boost
histograms in C++ from narf which allows them to be pickled
and/or passed directly to rabbit without creating a dense intermediate.

  • Add SparseHist wrapper combining a scipy sparse array with hist
    axes
    (c833677): stores the dense N-D shape implied by a sequence
    of hist axes in the with-flow layout (axis.extent per axis) and
    provides toarray and to_flat_csr methods that extract either
    the with-flow or no-flow representation. Also supports dict-style
    slicing along axes by regular-bin index for use cases such as
    multi-systematic dispatch in rabbit.

  • Use int64 indices in SparseHist.to_flat_csr for large flat
    sizes
    (256be1f): the CSR returned previously cast indices and
    indptr to int32, which silently overflowed when the flat target
    size exceeded the int32 range. This affected SparseHist instances
    built from large multi-axis inputs (e.g. an
    (eta, phi, pt, mass, corparms) hist with ~108k corparms, where
    the with-flow flat size is ~6.3 billion bins). Now switch to int64
    whenever the target size does not fit in int32.

  • Protect against future incompatible change in hist (71f7eb6).

bendavid and others added 3 commits April 7, 2026 02:09
The wrapper stores the dense N-D shape implied by a sequence of hist axes
in the with-flow layout (axis.extent per axis) and provides toarray and
to_flat_csr methods that can extract either the with-flow or no-flow
representation. Also supports dict-style slicing along axes by regular-bin
index for use cases such as multi-systematic dispatch in rabbit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CSR returned by to_flat_csr always cast indices and indptr to int32,
which silently overflowed when the flat target size exceeded the int32
range. This affected SparseHist instances built from large multi-axis
inputs (e.g. a (eta, phi, pt, mass, corparms) hist with ~108k corparms,
where the with-flow flat size is ~6.3 billion bins). Now switch to
int64 whenever the target size does not fit in int32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
return tuple.__getitem__(self, key)


class SparseHist:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the idea but if we want to use SparseHist as drop in replacement for a regular Hist object we should give it the same attributes.

Right now "name" and "label" are the obvious ones missing.

There are also small differences e.g. the .shape for SparseHist includes under/overflow while it it not included in the regular Hist.

On the Hist object I can also do things like "h_dense.axes.name" which doesn't work for the SparseHist.

Functions like "fill" or "project" could be set as "NotImplemented" or "NotSupported"

Just for reference this is the full list:

>>> h_sparse.__dir__()
['_axes', '_dense_shape', '_size', '_flat_indices', '_values', '__module__', '__firstlineno__', '__doc__', '_underflow_offset', '__init__', '_from_flat', 'axes', 'shape', 'dtype', 'nnz', 'toarray', 'tocoo', 'to_flat_csr', '__getitem__', '__static_attributes__', '__dict__', '__weakref__', '__new__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__getstate__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']

>>> h_dense.__dir__()
['_variance_known', 'name', 'label', '__module__', '__firstlineno__', '__static_attributes__', '__orig_bases__', '__weakref__', '__doc__', '__parameters__', '_family', '__slots__', '__init__', '_generate_axes_', '_repr_html_', '_name_to_index', '_to_uhi_', 'from_columns', 'project', 'T', 'fill', 'fill_flattened', 'sort', '_convert_index_wildcards', '_loc_shortcut', '_step_shortcut', '_index_transform', '__getitem__', '__setitem__', 'profile', 'density', 'show', 'plot', 'plot1d', 'plot2d', 'plot2d_full', 'plot_ratio', 'plot_pull', 'plot_pie', 'stack', 'integrate', '__annotations__', '__init_subclass__', '_clone', '_new_hist', '_from_histogram_cpp', '_from_histogram_object', '_import_bh_', '_export_bh_', '__getattr__', '_from_uhi_', 'ndim', 'view', '__array__', '__hash__', '__eq__', '__ne__', '__add__', '__iadd__', '__radd__', '__sub__', '__isub__', '__mul__', '__rmul__', '__truediv__', '__div__', '__idiv__', '__itruediv__', '__imul__', '_compute_inplace_op', '__str__', '_axis', 'storage_type', '_storage_type', '_reduce', '__copy__', '__deepcopy__', '__getstate__', '__setstate__', '__repr__', '_compute_uhi_index', '_compute_commonindex', 'to_numpy', 'copy', 'reset', 'empty', 'sum', 'size', 'shape', '_handle_slice', '_rebin_with_groups', 'kind', 'values', 'variances', 'counts', '_hist', 'axes', '__dict__', '_types', '__class_getitem__', '__new__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__format__', '__sizeof__', '__dir__', '__class__']

>>> h_dense.__dict__
{'_variance_known': True, 'name': None, 'label': None}

>>> h_sparse.__dict__
{'_axes': (Regular(20, -5, 5, name='x'),), '_dense_shape': (22,), '_size': 22, '_flat_indices': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20]), '_values': array([265., 235., 247., 249., 249., 263., 260., 265., 226., 248., 247.,
       254., 230., 227., 261., 246., 254., 283., 242., 249.])}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants